18.05 Lecture 35 
May 9, 2005 



Kolmogorov-Smirnov (KS) goodness-of-fit test 
Chi-square test is used with discrete distributions. 

If continuous - split into intervals, treat as discrete. 

This makes the hypothesis weaker, however, as the distribution isn't characterized fully. 
The KS test uses the entire distribution, and is therefore more consistent. 

Hypothesis Test: 
i7i : P = Po 

^2 : P ^ Po 

Po - continuous 

In this test, the c.d.f. is used. 

Reminder: c.d.f. F{x) = P{X < x), goes from to 1. 



I 











The c.d.f. describes the entire function. 
Approximate the c.d.f. from the data — > 



Empirical Distribution Function: 

n ^-^ n 
by LLN, Fn{x) EI{Xi < x) = P(Xi < x) = F{x) 



l/n 




From the data, the composed c.d.f. jumps by 1/n at each point. It converges to the c.d.f. at large n. 
Find the largest difference (supremum) between the disjoint c.d.f. and the actual. 



sup \Fn{x) — F{x)\ n^^Too 
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For a fixed x: 

V^{Fr,{x) - F{x)) = 
By the central limit theorem: 



EiljXi <x)- EljXi < x)) 



w 7v(o, Var(7(Xi < x)) = p{l - p) = F{x){l - F{x)^ 
You can tell exactly how close the values should be! 

D„ = V^sup|F„(a;)-F(a;)| 

X 

a) Under Hi,Dn has some proper known distribution. 

b) Under H2, Dn — > +00 

If F{x) implies a certain c.d.f. which is 6 away from that predicted by Hq 




Fn{x) ^ F{x), |F„(x) - Fo{x)\ > 6/2 
V^\Fn{x) - Fo{x)\ > ^ ^ +00 

The distribution of £>„ docs not depend on F(x), this aUows to construct the KS test. 
Dn = Vnswp^ \Fnix) - F{x)\ = y/nswpy |F„(f -1(2/)) - y\ 
y = F{x),x = F-\y),ye [0,1] 



Fr,{F-\y)) = ]-Y.I{X, < F-'{y)) = ^^/(^(X,) < y) = ]-jZm < v) 
1=1 1=1 1=1 

Y values generated independently of F. 

¥{Yi <y) = P(F(X,) <y) = P{Xi < F-^y)) = F{F-\y)) = y 
Xi ~ Fix) 

F{Xi) ~ uniform on [0, 1], independent of Y. 

Dn is tabulated for different values of n, since not dependent on the distribution, 
(find table on pg. 570) 

For large n, converges to another distribution, whose table you can alternatively use. 
P(£>„ <t)^ H{t) = 1 - 2Ei=i(-l)'-'e-2''*' 

The function represents Brownian Motion of a particle suspended in hquid. 
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Distribution - distance the particle travels from the starting point. 
The maximum distance is the distribution of D„ 

H(t) = distribution of the largest deviation of particle in liquid (Brownian Motion) 
Decision Rule: 

S = {Hi: Dn < c; ^2 : £>„ > c} 

Choose c such that the area to the right is equal to a 











I?. 



Example: 

Set of data points as follows 
n 10, 

0.58, 0.42, 0.52, 0.33, 0.43, 0.23, 0.58, 0.76, 0.53, 0.64 

Hi : P uniform on [0, 1] 

Step 1: Arrange in increasing order. 

0.23, 0.33, 0.42, 0.43, 0.52, 0.53, 0.58, 0.64, 0.76 

Step 2: Find the largest difference. 

Compare the c.d.f. with data. 

Note: largest difference will occur before or after the jump, so only consider end points. 



F(x): 
Fn{x) before: 
Fn{x) after: 

Calculate the differences: \Fn{x) — F{x)\ 



0.23 0.33 0.42 

0.23 0.33 0.42 

0.1 0.2 

0.1 0.2 0.3 



Fn{x) before and F(x): 
Fn{x) after and F(x): 

The largest difference occurs near the end: |0.9 
£)„ = VT0(0.26) = 0.82 



0.23 
0.13 



0.23 
0.13 



0.22 
0.12 



0.641 =0.26 



Decision Rule: 

6 = {Hi: 0.82 < c; i?2 : 0.82 > c} 
c for a — 0.05 is 1.35. 
Conclusion - accept Hi. 



End of Lecture 35 
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